The genome sequence of black poplar, Populus nigra subsp. betulifolia L., 1753 (Salicaceae)

We present a genome assembly from an individual Populus nigra subsp. betulifola (black poplar; Tracheophyta; Malpighiales; Salicaceae). The genome sequence is 413.2 megabases in span. Most of the assembly (99.73%) is scaffolded into 19 chromosomal pseudomolecules. Mitochondrial and plastid genomes were also assembled. Three mitochondrial assemblies have lengths of 281.85, 335.57 and 186.15 kilobases, and the plastid genome has a length of 156.37 kilobases.


Background
The black poplar, Populus nigra L. is a large deciduous tree with a widespread distribution across Europe, northern Africa and east to central Asia (POWO, 2024;Stanton et al., 2010).This resilient species thrives in riparian ecosystems.It shows a remarkable tolerance to high water levels and sediment movements, and is often among the first species to colonise bare moist soil of riverbanks, making it vital to the establishment of riparian forests along river margins and in floodplains (Imbert & Lefèvre, 2003;Lefèvre et al., 1998).Such forests are not only biodiversity hotspots, but are also essential for the natural control of flooding, the prevention of bank erosion and the maintenance of good water quality (Van Looy et al., 2013).
Regrettably, the once widespread native P. nigra is now one of the most endangered tree species in Europe, mainly because of the loss of its natural alluvial habitat and due to competition and introgression with exotic poplar species (Cagelli & Lefevre, 1995;Fossati et al., 2004;Vanden Broeck et al., 2005).For these reasons, P. nigra merits particular attention with regard to conservation efforts.In Britain, the wild-growing black poplar (i.e.P. nigra subsp.betulifera) is one of the rarest tree species, with only about 7000 individuals.In contrast, cultivars of the Black poplar are among the most cultivated trees in temperate latitudes, sometimes as the pure species, but often as interspecific hybrids (e.g.Populus x canadensis Moench., which is a cross with the American species P. deltoides (Eastern Cottonwood)).Such hybrids are renowned for their rapid growth and high biomass yield, as well as their effective coppice regeneration and vegetative propagation, making it a suitable crop for the production of a great variety of products including wood, biofuels, and pulp and paper (Balatinecz & Kretschmann, 2001;Dickmann, 2006).Owing to its quality as a pioneer species, black poplar also provides a range of ecosystem services, including phytoremediation in polluted industrial zones, erosion control in river systems, and carbon sequestration by reforestation of lowlands in temperate regions (De Rigo et al., 2016;Di Baccio et al., 2003;Zalesny et al., 2016).Besides its economic and ecological importance, the poplar (including P. nigra) has become an important model tree genus for tree biotechnology (Douglas, 2017;Ellis et al., 2010).In part, this is due to its relatively small genome, and the ease with which it can be transformed.
Across its range, P. nigra displays remarkable phenotypic variation, but spontaneous hybridization makes taxonomic classification complex (Cagelli & Lefevre, 1995).Three subspecies are currently accepted (POWO, 2024), with P. nigra subsp.caudina (Ten.)Bugała found across Mediterranean Europe, P. nigra subsp.betulifolia found in northwestern Europe (Ireland, Britain, western France) and P. nigra subsp.nigra found across its range, but most common in central and eastern Europe, and Central Asia.Numerous forms and cultivars are known and many are easily vegetatively propagated, extensively planted as avenue trees and as wind breaks across Eurasia and beyond.The species is naturalised in North America, Argentina, South Africa, the Himalayas, East Asia and eastern Australia (POWO, 2024).One of the most recognisable forms is the fastigiated Lombardy poplar, P. nigra cv.Italica.This cultivar was selected in Lombardy, northern Italy in the seventeenth century, after which is soon became popular as a parkland tree across Europe.Crosses of this cultivar with subsp.betulifolia (Plantières Group) are the main trees called Lombardy poplar in Britain and Ireland, as these are better suited to the cooler climate.
Here we present the first genome from a wild stand of Populus nigra subsp.betulifolia.We hope this resource will contribute to playing a pivotal role in the fields of functional genomics, genetic engineering and molecular breeding of this economically significant genus.The reference genome will enable the systematic assessment and characterization of sequence variation among natural populations of black poplar, without the need to rely on reference-guided mapping and variant calling based solely on the reference genome of P. trichocarpa Torr.& A.Gray ex.Hook.This will encompass not only Single Nucleotide Polymorphisms (SNPs), but also structural variants, known to influence the regulation of complex quantitative traits in poplars (Bastiaanse et al., 2019).The genome will be particularly valuable in large system genetics analysis such as the ongoing ERC project POPMET (DOI10.3030/834923), a project that aims to integrate genomic, transcriptomic and metabolomic data for the identification of secondary metabolites, metabolic pathways and their genes in P. nigra.Finally, the availability of a P. nigra reference genome, combined with the existing suite of published Populus reference genomes, will empower precise assessments of synteny, recombination and chromosomal origins.This invaluable resource opens up exciting opportunities for the study of adaptive evolution in long-lived woody species.

Genome sequence report
The genome was sequenced from a female individual of Populus nigra subsp.betulifolia (Figure 1) collected from along the Thames towpath in Barnes, Richmond, Surrey, UK (51.48, -0.23).Using flow cytometry, the genome size (1C-value) was estimated to be 0.52 pg, equivalent to 500 Mb.A total of 54-fold coverage in Pacific Biosciences single-molecule HiFi long reads was generated.Primary assembly contigs were scaffolded with chromosome conformation Hi-C data.Manual assembly curation corrected 156 missing joins or mis-joins and removed 11 haplotypic duplications, reducing the assembly length by 1.66% and the scaffold number by 85.12%, and increasing the scaffold N50 by 25.44%.The final assembly has a total length of 413.2 Mb in 21 sequence scaffolds with a scaffold N50 of 22.5 Mb (Table 1).The snail plot in Figure 2 provides a summary of the assembly statistics, while the distribution of assembly scaffolds on GC proportion and coverage is shown in Figure 3.The cumulative assembly plot in Figure 4 shows curves for subsets of scaffolds assigned to different phyla.Most (99.73%) of the assembly sequence was assigned to 19 chromosomal-level scaffolds.Chromosome-scale scaffolds confirmed by the Hi-C data are named in order of size (Figure 5; Table 2).The order and orientation of contigs along chromosome 2 between 4 Mb and 23 Mb is uncertain.While not fully phased, the assembly deposited is of one haplotype.Contigs corresponding to the second haplotype have also been deposited.The mitochondrial and plastid genomes were also assembled and can be found as contigs within the multifasta file of the genome submission.

Sample acquisition, genome size estimation and nucleic acid extraction
A specimen of Populus nigra subsp.betulifolia (specimen ID KDTOL10478, ToLID ddPopNigr1) was collected from along the Thames towpath in Barnes, Richmond, Surrey, UK (latitude 51.48, longitude -0.23) on 2022-05-27.The specimen was collected and identified by Maarten Christenhusz (Royal Botanic Gardens, Kew, RBG Kew) and frozen at -80 °C.The herbarium voucher associated with the sequenced plant is Christenhusz no.9355 and is deposited in the herbarium of RBG Kew (K) (K001401000).
The genome size was estimated by flow cytometry using the fluorochrome propidium iodide and following the 'one-step' method as outlined in Pellicer et al. (2021).For this species, the General Purpose Buffer (GPB) supplemented with 3% PVP and 0.08% (v/v) beta-mercaptoethanol was used for isolation of nuclei (Loureiro et al., 2007), and the internal calibration standard was Solanum lycopersicum 'Stupiké polní rané' with an assumed 1C-value of 968 Mb (Doležel et al., 2007).
The workflow for high molecular weight (HMW) DNA extraction at the Wellcome Sanger Institute (WSI) includes a sequence of core procedures: sample preparation; sample homogenisation, DNA extraction, fragmentation, and clean-up.In sample preparation, the ddPopNigr1 sample was weighed and dissected on dry ice (Jay et al., 2023).For sample homogenisation, leaf tissue was cryogenically disrupted using the Covaris cryoPREP ® Automated Dry Pulverizer (Narváez-Gómez et al., 2023).HMW DNA was extracted using the Automated Plant MagAttract v3 protocol (Todorovic et al., 2023a).HMW DNA was sheared into an average fragment size of 12-20 kb in a Megaruptor 3 system with speed setting 30 (Todorovic et al., 2023b).Sheared DNA was purified by solid-phase reversible immobilisation (Strickland et al., 2023): in brief, the method employs a 1.8X ratio of AMPure PB beads to sample to eliminate shorter fragments and concentrate the DNA.The concentration of the sheared and purified DNA was assessed using a Nanodrop spectrophotometer and Qubit Fluorometer and Qubit dsDNA High Sensitivity Assay kit.Fragment size distribution was evaluated by running the sample on the FemtoPulse system.Protocols developed by the WSI Tree of Life core laboratory are publicly available on protocols.io(Denton et al., 2023).

Sequencing
Pacific Biosciences HiFi circular consensus DNA sequencing libraries were constructed according to the manufacturers' instructions.Poly(A) RNA-Seq libraries were constructed using the NEB Ultra II RNA Library Prep kit.DNA and RNA sequencing was performed by the Scientific Operations core at the WSI on Pacific Biosciences SEQUEL II (HiFi) and Illumina NovaSeq 6000 (RNA-Seq) instruments.Hi-C data were also generated from leaf tissue of ddPopNigr1 using the Arima2 kit and sequenced on the Illumina NovaSeq 6000 instrument.
Table 3 contains a list of relevant software tool versions and sources.

Wellcome Sanger Institute -legal and governance
The materials that have contributed to this genome Further, the Wellcome Sanger Institute employs a process whereby due diligence is carried out proportionate to the nature of the materials themselves, and the circumstances under which they have been/are to be collected and provided for use.
The purpose of this is to address and mitigate any potential legal and/or ethical implications of receipt and use of the materials as part of the research project, and to ensure that in doing so we align with best practice wherever possible.The overarching areas of consideration are: • Ethical review of provenance and sourcing of the material  • Legality of collection, transfer and use (national and international) Each transfer of samples is further undertaken according to a Research Collaboration Agreement or Material Transfer Agreement entered into by the Darwin Tree of Life Partner,

Parimalan Rangan
ICAR-National Bureau of Plant Genetic Resources, New Delhi, India The present manuscript describes and reports the genome sequence of black poplar species.The species being dioecious in nature, would appreciate if the authors could generate the genome assembly for a male tree as well.
This will help many of the downstream analysis and understanding at genome-scale.
Also, authors may possibly use LAI tool to assess their assembly quality to categorize whether it is a draft or reference-scale genome.Inclusion of this metric might add value to the manuscript.

Marco Pessoa-Filho
Brazilian Agricultural Research Corporation, Brasília, Brazil The data note describes the genome assembly of a female individual of Populus nigra subsp.betulifolia, found in Northwestern Europe.It is an endangered species and has received attention in conservation efforts.It is a pioneer species which thrives in riparian ecosystems and provides a range of ecosystems services.Other reference genome assemblies ara available in the genus but this is the first for the species/subspecies.
PacBio HiFi reads were obtained (54-fold coverage) and used for assembly with hifiasm.Hi-C reads generated with the Arima2 kit were used to obtain chromosome scale scaffolds with YaHS.Manual curation was carried with HiGlass and Pretext.Merqury was used to assess k-mer completeness and QV consensus quality.RNAseq data was obtained from leaf tissue, but annotation was not reported in the data note.
Raw data and assembly were deposited in INSDC databases and are publicly available.
The rationale for creating the dataset was clearly described.Protocols were appropriate and the work is technically sound.The datasets are clearly presented in a usable and accessible format.
More details on genome assembly, curation and evaluation could be provided in the Materials and Methods and would enrich the data note and allow its replication: 1) Was Hifiasm run with default parameters? 2) Considering that Hi-C data was available, was it used as input to Hifiasm for phasing?
3) What output was further used in the pipeline for curation and scaffolding?Was it the p_ctg?The hap1_ctg?4) Considering that hifiasm already includes purging of haplotype duplications, why was purge_dups used?Was hifiasm run without haplotype duplication removal? 5) What dataset was used as input to Merqury along with the assembly to assess k-mer completeness and QV consensus quality values?6)

Boas Pucker
Technische Universitat Braunschweig, Brunswick, Lower Saxony, Germany This genome announcement by Christenhusz and Bastaiaanse describes the genome sequence of Populus nigra.The project was done by the Darwin Tree of Life Consortium.The genome sequence is certainly a helpful resources and it is good to see all data released.
Here are some comments to improve the manuscript: "The genome sequence is 413.2 megabases in span."...this sentence should be rephrased.

○
There are some redundant sentences in the introduction.For example, ecosystem services are reported twice.
the sequencing attempts?
The assembly process lacks version numbers for the mentioned tools and also the necessary details to repeat all steps.Please include all parameters used.Reviewer Expertise: plant genomics, specialized plant metabolism, applied bioinformatics I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

Figure 2 .
Figure 2. Genome assembly of Populus nigra subsp.betulifolia, ddPopNigr1.1:metrics.The BlobToolKit snail plot shows N50 metrics and BUSCO gene completeness.The main plot is divided into 1,000 size-ordered bins around the circumference with each bin representing 0.1% of the 414,184,750 bp assembly.The distribution of scaffold lengths is shown in dark grey with the plot radius scaled to the longest scaffold present in the assembly (50,570,738 bp, shown in red).Orange and pale-orange arcs show the N50 and N90 scaffold lengths (22,490,379 and 14,411,003 bp), respectively.The pale grey spiral shows the cumulative scaffold count on a log scale with white scale lines showing successive orders of magnitude.The blue and pale-blue area around the outside of the plot shows the distribution of GC, AT and N percentages in the same bins as the inner plot.A summary of complete, fragmented, duplicated and missing BUSCO genes in the eudicots_odb10 set is shown in the top right.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/ddPopNigr1_1/dataset/ddPopNigr1_1/snail.

Figure 3 .
Figure 3. Genome assembly of Populus nigra subsp.betulifolia, ddPopNigr1.1:BlobToolKit GC-coverage plot.Scaffolds are coloured by phylum.Circles are sized in proportion to scaffold length.Histograms show the distribution of scaffold length sum along each axis.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/ddPopNigr1_1/dataset/ddPopNigr1_1/blob.

Figure 4 .
Figure 4. Genome assembly of Populus nigra subsp.betulifolia, ddPopNigr1.1:BlobToolKit cumulative sequence plot.The grey line shows cumulative length for all scaffolds.Coloured lines show cumulative lengths of scaffolds assigned to each phylum using the buscogenes taxrule.An interactive version of this figure is available at https://blobtoolkit.genomehubs.org/view/ddPopNigr1_1/dataset/ddPopNigr1_1/cumulative.
note have been supplied by a Darwin Tree of Life Partner.The submission of materials by a Darwin Tree of Life Partner is subject to the 'Darwin Tree of Life Project Sampling Code of Practice', which can be found in full on the Darwin Tree of Life website here.By agreeing with and signing up to the Sampling Code of Practice, the Darwin Tree of Life Partner agrees they will meet the legal and ethical requirements and standards set out within this document in respect of all samples acquired for, and supplied to, the Darwin Tree of Life Project.

Figure 5 .
Figure 5. Genome assembly of Populus nigra subsp.betulifolia, ddPopNigr1.1:Hi-C contact map of the ddPopNigr1.1 assembly, visualised using HiGlass.Chromosomes are shown in order of size from left to right and top to bottom.An interactive version of this figure may be viewed at https://genome-note-higlass.tol.sanger.ac.uk/l/?d=HKCLL3icTTyCwOPRfeCJ2A.

○
Content off Figure3and Figure4could be summarized in the text to make the manuscript more concise.○ Is the rationale for creating the dataset(s) clearly described?Partly Are the protocols appropriate and is the work technically sound?Partly Are sufficient details of methods and materials provided to allow replication by others?Partly Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.
*RNA was extracted from leaf tissue of ddPopNigr1 in the Tree of Life Laboratory at the WSI using the RNA Extraction: Automated MagMax™ mirVana protocol(do Amaral et al.,  2023).The RNA concentration was assessed using a Nanodrop spectrophotometer and a Qubit Fluorometer using the Qubit RNA Broad-Range Assay kit.Analysis of the integrity of the RNA was done using the Agilent RNA 6000 Pico Kit and Eukaryotic Total RNA assay.

Table 3 . Software tools: versions and sources. Software tool Version
trees.Proc Natl Acad Sci U S A. 2019; 116(27): 13690-13699.PubMed Abstract | Publisher Full Text | Free Full Text Cagelli L, Lefevre F: The

the rationale for creating the dataset(s) clearly described? Yes Are the protocols appropriate and is the work technically sound? Yes Are sufficient details of methods and materials provided to allow replication by others? Yes Are the datasets clearly presented in a useable and accessible format? Yes Competing Interests:
No competing interests were disclosed.

have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

the rationale for creating the dataset(s) clearly described? Yes Are the protocols appropriate and is the work technically sound? Yes Are sufficient details of methods and materials provided to allow replication by others? Partly Are the datasets clearly presented in a useable and accessible format? Yes Competing Interests:
Table 3 lists MerquryFK, not the original Merqury.This should be mentioned in the text.7) Authors should describe what exactly the pipelines sanger-tol/readmapping and sangertol/genomenote do.No competing interests were disclosed.
Background, third paragraph:"...after which is soon became popular" should be "after which it soon became popular"Is

have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.
Reviewer Report 26 July 2024 https://doi.org/10.21956/wellcomeopenres.23560.r88962© 2024 Pucker B. This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Looy et al., 2013does not conclude (and also not study in detail) that Populus nigra forest are a biodiversity hotspot."Regrettably," ... Please rephrase, because this reads like activism and not like objective science.
○"This resilient species thrives in riparian ecosystems."... what is the evidence for this broad statement?○"Such forests are not only biodiversity hotspots,"...It appears that the cited reference Van ○